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Abstract 

A central problem to parallel processing is the determination of an effective partitioning of work- 
load to processors. The effectiveness of any given partition is dependent on the stochastic nature 
of the workload. We treat the problem of determining when and if the stochastic behavior of the 
workload has changed enough to warrant the calculation of a new partition. We model the prob- 
lem as a Markov decision process, and derive an optimal decision policy. Quantification of this 
policy is usually intractable; we empirically study a heuristic policy which performs nearly 
optimally. Our results suggest that the detection of change is the predominant issue in this prob- 
lem. 
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I. Introduction 

We consider a multiprocessor system without a dynamic scheduling facility; e.g., a loosely 
coupled message passing system. We presume that some computation has been partitioned onto 
the processors, and that this partition cannot be easily changed while the computation is execut- 
ing. The computation s execution time under this partition is presumed to be dependent on the 
stochastic behavior of the computation. As long as the stochastic behavior is the same as when 
the partition was chosen, we suppose that a different partition will not yield an execution time 
performance gain. If the stochastic behavior does change, a new partition may better exploit the 
new behavior. The problem is to detect change in the computation’s stochastic behavior, deter- 
mine the performance benefits of implementing a new partition, and weigh those benefits against 
the overhead costs of calculating and implementing a new partition. We treat the repartitioning 
decision problem as a Markov decision process. This problem should be distinguished from the 
partitioning (or task assignment) problem, which addresses how to partition a workload among 
processors. Rather, we are examining the issue of when to abandon one partition and adopt 
another. These tv/o problems are closely related, as the overall performance depends both on how 
well and how often a workload is partitioned. Previous work in partitioning has focused only on 
the first problem. Our work considers how often a partitioning algorithm should be applied, as a 
function of the quality of the partitioning and the behavior of the computation under that parti- 
tioning. 

Consider a computation which can be described as a sequence of "cycles* 1 which are proba- 
bilistically identical, and which are independent of the computation’s partitioning. Examples of 
such computations include iterative numerical methods, real-time systems which execute periodic 
monitoring tasks, and simulation programs [12]. The partitioning of such computations for paral- 
lel processing may take into account many factors, e.g. data dependencies, and system resource 
requirements. Once a partition is chosen, it may be quite difficult and/or expensive to dynami- 
cally move small portions of the computation’s workload. Yet, the chosen partition may become 
quite unsatisfactory if the stochastic nature of the computation changes. We are then presented 
with the problem of needing to repartition, but are constrained to devising a completely new par- 
tition. The related literature falls into two categories, neither of which addresses this particular 
problem. The work reported in [5], [6], [9], and [25] essentially presumes that jobs arrive at a cen- 
tral dispatcher which assigns jobs to processors. Our problem eschews the job arrival model, and 
does not allow a dynamic routing mechanism. A more recent body of work including [7], [14], 
[22], [23], [26] allows decentralized assignment decisions to be made dynamically. Again, our 
problem presumes that incremental dynamic reassignment is not feasible. Static and dynamic 
task assignment algorithms are presented in [1], [2], [4], [10], [11], [13], [17], [24]. The dynamic 
assignment algorithms consider restricted classes of computations; the static partitioning algo- 
rithms might be used in conjunction with our repartitioning decision policy if the computations 
partitioned by such methods change their behavior. Our work is a variation on aspects of the 
broad treatment of change detection under uncertainty given in [18]. Our model modifies this 
analysis by using a different decision cost structure, and by assuming a random computation 
duration. Our work’s main contribution is as the first application of statistical analysis and Mar- 
kov decision theory to the performance issue of when to reconfigure a workload distribution in a 
parallel processing environment. 

Suppose we measure the performance of a partitioned cyclic computation by observing the 
sum of processor utilizations over every cycle. Letting Z { denote the ith utilization measurement, 
we suppose that the sequence Z 2 , • • * forms a weakly stationary stochastic process [20]. 
Intuitively, this means that every Z^ and Zj have the same mean, the same variance, and that 
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their covariance depends only on 1 t — j \ . We anticipate that at some unknown future time the 
stochastic behavior of this computation will change, at which point it may be advantageous to 
redistribute the computational workload. A change of this sort can be expected if the probabilis- 
tic behavior of the inputs driving the system change; such a change could also occur as a result of 
an intrinsic change in the behavior of the computation. In this paper we consider the problem of 
detecting and reacting to such a change. We develop a real-time change monitoring decision pro- 
cess which balances the expected costs of redistribution against the expected benefits. Within the 
constraints imposed by the change detection method, the decision policies we describe are shown 
to minimize the computation’s expected execution time. 

In section II we present a statistical means of determining when a computation’s stochastic 
behavior changes. Section III shows how to dynamically maintain the probability of a change 
having already occurred. Section IV discusses our assumptions about the duration of the compu- 
tation; section V formulates the repartitioning decision problem as a Markov decision process. 
Sections VI and VII analyze this model, and characterize a decision policy which minimizes the 
expected computation execution time. As the calculation of the optimal decision policy is usually 
intractable, we propose a heuristic policy in section VIII. Section IX reports empirical results 
which suggest that our heuristic tends to perform very close to optimally. Section X talks about 
the possibility of multiple changes during a computation, and section XI contains our conclu- 
sions. 

II. Detecting Change 

In this section we discuss a statistical approach to detecting change in the computation’s 
stochastic behavior. The proposed method is not an integral part of our dynamic partitioning 
solution, it could be replaced with any other statistical change detection technique. Our method 
does have the advantage of computational simplicity with minimal storage requirements. 

The change monitoring process examines the sequence { Z ,} of performance measures. Each 
random variable Z t - is drawn from a common distribution F whose exact characterization is not 
known. We say that a change occurs at cycle j if Z^s distribution is different from Zj__ j’s. We 
presume that at most one change will occur in the course of the computation. The probabilistic 
nature of this problem requires us to statistically determine when a change occurs. However, F is 
general, and the performance observations are correlated. For statistical tractability, we presume 
that the sequence {//,} is first transformed into a sequence {X,} of approximately normal and 
independent observations. This is accomplished using the batch means [8] transformation, where a 
sequence of d observations is replaced by its mean value. Thus the transformed observation X i is 
defined by 

i'-i 

= 4 s w 

“ y=o 

If Z d .j+j has mean /i and variance a 2 for t = 1, • • • ,d— 1, then X i is approximately normal with 
mean /i and variance a 2 / d . 

Our change monitor examines the observations {X t } as they become available to determine 
if and when their underlying normal distribution changes. Solutions to the so-called model 
identification problem [3] in statistics can be used to detect this change. In our situation, model 
identification decides whether two groups of independent normal observations are drawn from a 
single normal distribution. 
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Using a model identification approach, we create a test cluster of c adjacent normal obser- 
vations. The ith cluster C i is defined to be the collection 

We assume the existence of a base cluster B of size c derived from initial observations taken 
before a change could have occurred. The model identification test determines whether observa- 
tions in B and C i are identically distributed. Positive indication of distributional difference is evi- 
dence for the change having already occurred. The AIC model identification approach described 
in [3] attempts to describe the data set BuC { with a probabilistic model having as few parame- 
ters as possible while still fitting the data. We consider two competing models of BuC One 
model states that BuC ,♦ is a set of 2 *c observations drawn from a normal distribution 
This model has two parameters, /i j and <Tj. The competing four parameter model states that B is 
a set of normal iV(/i£,< Tq) observations and that C i is a set of normal observations, 

an d For each model we calculate the statistic 

AIC - -2/(P) + 2-P 

where P is the number of model parameters, and l(P) is the maximized log-likelihood function 
[12] using P parameters. The model achieving the minimum AIC statistic is taken as the most 
parsimonious. 

The statistic AIC j for the two parameter model is given by 

AICj = c’ln(&j) + 4 

and the competing model’s statistic is given by 

AIC, = ±M&Wc) + 8 

where a 5, are the sample variances for the sets BuC £?, and respectively. One 

advantage of the AIC model selection criterion is its simplicity. The significance level of the 
statistic is effectively chosen by its derivation, and no statistical tables need to be stored. Furth- 
ermore, the calculations take linear time in the size of the clusters. 

The model identification criterion may fail to correctly determine whether a cluster C i 
represents the existence of a change. In a Bayesian framework, the AIC test gives us some indica- 
tion of change, with uncertainty. If we have a prior probability of change, we can calculate a pos- 
terior probability of change as a function of the test result. This calculation requires knowledge of 
the statistical test’s accuracy. We let a denote the probability that the model selection statistic 
falsely indicates a change (type I error); we let /? denote the probability that the statistic fails to 
detect a change (type II error). 

ILL Calculating the Probability of Change 

We now consider the calculation of the probability of change. A measurement of system 
utilization is taken every cycle. The mechanics of the batch means transformation produce a 
transformed observation every d cycles; a new test cluster C is thus available every c m d cycles. 
The beginning of cycle rrc m d is called the nth decision step or time n, and is the nth epoch at 
which a test cluster is evaluated for change. The c*d cycles between decision steps are known as a 
decision interval . We define p n to be the probability that a change has occurred by decision step 
n. p n is calculated as a function of p n _ 1? and the result of the statistical change test performed at 
time n. This function’s description requires the evaluation of p (p n _i), the probability that a 
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change will have occurred by time n, given only the value of p n-1 . p (p n _i) can be calculated 
at time n— 1, but must presume some prior knowledge of the distribution of the time of change. 
Supposing that such foreknowledge is not precisely known, it is both reasonable and convenient to 
assume that the failure rate of the time of change distribution is some constant cf>. By condition- 
ing on the time of change, we have 

P*(Pn-l) = Pn-1 + ~ Pn-l) ( X ) 


= (! ~ <t>)-Pn-\ + 4>- 


P*(Pn-i) probability of change by time n, before a change test is performed at time n. p n 

depends both on p (p n _i) and the result of the change test performed at decision step n. The 
posterior probability p n is calculated using Bayes’ Theorem [21] . If the test at decision step n 
indicates change, p n is given by p c (p n _x): 


Pn = P C (Pn-l) 


P*(P-lHl-fl 

p'CPn-lM 1 -/?)+(!- p'(Pn-l)) ,Q: 


( 2 ) 


Given a negative indication of change, p n is defined by p c (p ft _i)* 


Pn = P C (Pn-l) 


P *(Pn— l) 'P 

p'{Pn-l)'P + C 1 “ P *(Pn— l)) "(^ - a ) 


( 3 ) 


As the test clusters become available for statistical testing, equations (l)-(3) allow us to maintain 
the probability that the change has already occurred. 

At each decision step we will decide whether to repartition the computational load. This 
decision should be based on the costs and benefits of repartitioning. Presuming we receive no sub- 
stantial performance benefits if a change has not occurred, we see that the probabilities {p n } 
should play a pivotal role in determining whether (and when) repartitioning is worthwhile. Sec- 
tions V and VI confirm this intuition. 


IV. Number of Cycles 

We presume that the computation will require a random number N decision steps, irrespec- 
tive of its partitioning. We assume that N is bounded above by some constant Af. Our analysis 
also assumes that the distribution of N has an increasing failure rate function. That is, the failure 
rate probability 

Prob{N = n) 

Prob{N > n} 

is an increasing function of n. An intuitive explanation of this requirement is that the longer the 
computation continues, the more likely it is that the computation will stop with the next decision 
step. 

We use an equivalent statement of increasing failure rate probabilities. To express this 
equivalency, we first define the (stochastically larger) relation between random variables. If X 
and Y are random variables, we say that X Y if for all increasing functions g , 
^ E[</(y)]. Let N n denote the random variable N conditioned on N ^ n. [20] shows that 
assuming N has an increasing failure rate function is equivalent to assuming that N n ^ L N n+l for 
all n ^ 0. 
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Finally, we let N n denote the expected value of N , given that N ^ n. 

V. A Markov Decision Process 

Our notation concerning Markov decision processes is taken largely from [19]. Consider a 
stochastic process whose state we observe at each of a sequence of times t = 0, 1, • • • . Let I be 
the set of all possible states. At each time j , the state of the process is discerned to be some 
s e I. Then a decision is made, choosing some action a from a finite set A\ the choice of action a 
while in state s incurs a cost c(s,a). c(s,a) may be random; we assume that E[c(s,a)] is finite for 
all states s and actions a. The decision process then passes into another state. The probability 
P, t ( a ) of passing into state q from a is dependent on the action a chosen in state s. The expected 
total cost of a decision policy is the expected sum of the costs incurred at each decision step. An 
optimal decision policy minimizes the expected total cost. 

We restrict our attention to the class of stationary decision policies, those policies which are 
deterministic functions of the discerned state. A useful theorem concerning optimal stationary 
decision policies is given by [19]. 

THEOREM 1 : Let V(s) be the expected total cost of the process which starts in state s, and 
which is governed by the optimal stationary policy. Then, 

V { 3 ) = min | c(s,a) + 

A \ ,d 

□ 

The function V(s) is known as the optimal cost function. From state s, the optimal stationary 
decision is the choice of action which minimizes the right hand side of equation (4). 

We now formulate the repartitioning decision problem as a Markov decision process. Table 
/ summarizes the notation used by this decision process formulation. The decision process time 
steps will be precisely the decision steps. The decision costs are functions(in part) of e 0 and e r , 
the expected execution times of a decision interval after a change, with the original and with the 
new partition, respectively. We presume that e 0 > e T . A state of the process has the form 
<p,n>, reflecting the decision step n, and posterior probability of change, p = p n . The decision 
process chooses to test, or to retain. The retain decision causes the current partition to be kept for 
the next decision interval. Our cost structure is concerned only with execution times after a 
change. Therefore, if the anticipated change has not occurred, the retain decision exacts cost 0. 
Otherwise, the retain decision causes the next decision interval execution to have a mean execu- 
tion time of e 0 . The expected cost of the retain decision in state <p,n> is p'e 0 . 

The decision process may decide to test. This decision causes the running system to halt; a 
new partition is calculated, and is tested against the old partition on recent workload profiles. We 
let D i denote the delay caused by calculating and testing a new partition. Partition comparison 
can be placed in a decision theoretic context, as described in [15]. If the new partition is found 
superior, the change is considered to have occurred, and the decision process is considered to be 
stopped. A stopped process incurs an additional time cost D r quantifying the time required to 
implement the new partition. A stopped process also incurs an execution cost e r for every remain- 
ing decision interval in the computation. At time n, the expected value of this execution cost is 
equal to e r ’(N n — n + 1). If the change has occurred, the test decision incurs an expected cost 
D& + D r + e r -(N n — » + 1). A premature test decision incurs only the calculation delay D d . In 
this case the probability of change by time n is taken to be zero, and the decision process 
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Notation 

Definition 

n 

Decision Step Number 

N 

Random number of decision steps 

M 

Upper Bound on N 

K 

N given TV ^ n 

N n 

E{N„\ 

e o 

Decision Interval Post-Change Execution Time, Original Partition 

e T 

Decision Interval Post-Change Execution Time, New Partition 

Di 

Delay to Calculate and Test New Partition 

D r 

Delay to Implement New Partition 

a 

Change Test Type I Error 

p 

Change Test Type II Error 

4> 

Time of Change Failure Rate Probability 

p\p) 

Pre-Observation Probability of Change At Next Decision Step 

pip) 

Posterior Probability of Change After Positive Change Observation 

pip) 

Posterior Probability of Change After Negative Change Observation 

vip) 

Probability of Observing Change Next Observation 

7 C (P) 

Probability of Not Observing Change Next Observation 

V(<p,n>) 

Optimal Cost Function 

^[<p,n>] 

Expected Future Value of V r (<-,n-M>) from <p,n> 

R(<p,n>) 

Optimal Future Costs Given Retain at <p,n> 

T(<p,n > ) 

Optimal Future Costs Given Test at <p,n> 


Table I 


continues. The expected cost of choosing to test in state <p,n> is thus given by 

D i + P’( V(^n - n + 1) + D r ) . 

We now consider the state transition probabilities. The state following a retain decision 
from <p,n> depends on the result of the change detection test to be performed at time n + 1. 
From state <p,n>, the probability of observing a change at time n + 1 is given by g c (p), found 
by conditioning on the time of change: 

9 C (P) = P*(P)*(1 - P) + (1 - p*(p))‘Oc. (5) 

From <p,n> the probability q c (p) of not observing a change at time n + 1 is just 1 — g c (p). 
Given a retain decision in <p,n>, the decision process passes into state <p c (p),n+l> with proba- 
bility g c (p); it passes into state <p c (p),n+l> with probability g c (p). 

The optimal cost function’s state transition component is concisely represented by the fol- 
lowing function. Let 

E v (<p,n>) = ? e (p)*F(<p e (p),n+l>) + v~(p)* V(<p T (p),n+l>). (6) 

D v {<P,n>) * s interpreted as the minimized expected future costs at time n+1, as seen from state 
<p,n> after a retain decision. 
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The decision to retain in <p,n> incurs an expected cost p m e 0 ; the minimal expected future 
costs are given by i? v (<p,n>). The expected future cost of the policy which retains in <p,n>, 
and thereafter uses the optimal stationary policy is thus 

R(<p,n>) = p-e 0 + E v (<p,n>). (7) 

Similarly, the decision to test in <p,n> incurs an expected cost D d + p*(e r *(N n —n+l) + D r ). No 
other costs are incurred if the process stops. If instead the process rejects the new partition, the 
probability of change is taken to be zero, the state transition probabilities into time n+1 are 
identical to those after a retain decision from state <0,n>. Thus the minimal expected future 
costs in this case are simply E v (<0,n>). The expected future cost of the policy which retains in 
<p,n> and thereafter uses the optimal stationary policy is thus 

T(<p,n>) = D d + p-( e f -{N n -n+l) + £>,)) + (1 - p)-E v {<0,n>). (8) 

In terms of equations (7) and (8), Theorem 1 states that 

V r (<p,n>) = min | T(<p,n> ), i2(<p,n>)| , (9) 

so that the optimal stationary decision in state <p,n> is to retain if and only if 
72(<p,n>) ^ T(<p,n>). 

Equations (6)-(9) illustrate the recursive relationship satisfied by the optimal cost function. 
Since the number of decision steps is bounded above by M, we can define V(<p,M-fl>) = 0 for 
all p e [0,1], and can solve for V(<p,n>) when 0 ^ n ^ M. We next show that the solution of 
V r (<p,n>) is nicely characterized without explicit quantification. 

VI. Properties of V(<p,n>) 

We will demonstrate that the optimal stationary decision policy is given by a sequence 
7r 0 , 5 * ’ * of thresholds from the interval [0, 1]. The optimal decision in state <p,n> is to test 

if and only if p > 7r n . This structure is revealed by analysis of T(<p,n>) and 72(<p,n>) for fixed 
n as a functions of p. We will show that for every n, there exists 7r n such that whenever p ^ 7 r n 
then 72(<p,n>) T(<p,n>), and whenever p > 7r n then i?(<p,n>) > T(<p,n>). Our vehicle for 

this result is the demonstration that for fixed n, T(<p,n>) is linear in p, and 12(<p,n>) is con- 
cave in p. We analyze the values of these functions at their endpoints and argue that T(<p,n>) 
and 72(<p,n>) can intersect at most once, at p = 7r n . 

Some of our analysis conditions on the value of N. We use the notation /(<p,n> | N=m ) to 
denote the value of function / at state <p,n> given that N = m. 

Our first observation is that for any fixed n, T(<p,n>) is a linear function of p. This is 
apparent from equation (8), as the value J? r (<0,n>) is independent of p. Thus 

LEMMA 1 : For fixed n, T(<p,n>) is a linear function of p. 

□ 


We next observe that for fixed n, IZ(<p,n>) is a piece-wise linear continuous concave ( plcc ) 
function of p. This result follows primarily from the following lemma reported in [18] and stated 
in terms of our notation: 
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LEMMA 2 : Suppose that N = m. If F(<p,n+l>| N=m) is a p/cc function of p, then 
E v (<p y n>\ N=m) is a plcc function of p. 

□ 

We use this lemma to establish another. 

LEMMA 3 : For every fixed n ^ 0, 7(<p,n>) and fl(<p,n>) are plcc functions of p. 

PROOF: We first condition on N = m for some m. We then inductively show that 
V r (<p,n>| N=m) and /2(<p,n>| N=m) are plcc functions of p. For the base case we consider 
n = m. For any p e [0,1], 

i?(<p,m>| N=m) = p-e 0 + 2? p (<p,m>| N=m) 

= P € o 

since V r (<p,m+l>| N=m) = 0 for all p. Thus i2(<p,m>| N=m) is p/cc in p. We also observe 
that T(<p,m>| N=m) is p/cc since it is linear. The class of plcc functions is closed under the 
pointwise minimum operation; V r (<p,m>| N=m) must also be p/cc, establishing the induction 
base. 

For the induction hypothesis we suppose that both i?(<p,n+l>| N=m) and F(<p,n+l>| N=m) 
are plcc functions of p for some n ^ m — 1. Lemma 2, and the closure of p/cc functions under 
addition and pointwise minimum again ensure that J2(<p,n>) and V r (<p,n>) are plcc functions 
of p, completing the induction. 

To complete the proof, we note that the class of p/cc functions is also closed under scalar multi- 
plication, and observe that 

M 

7(<p,n>) = Yj Prob{N = m}- V r (<p,n>| N=m) 

m= 0 

and 

M 

iZ(<p,n>) = Prob{N = m}-iZ(<p,n>| N=m) 

m - 0 


□ 


We next analyze the values of T(<p,n>) and iZ(<p,n>) at p = 1. We show that V r (<l,n>) 
is a well-behaved function of n. 

LEMMA 4 : Either 

(i) V r (<l,n>) = T(<l,n>) for all n for which Prob{N — n}^0; or 

(ii) V r (<l,n>) = jR(<l,n>) for all n for which Prob{N = n}^0; or 

(iii) There exists an n 0 (possibly oo) such that for all n < n 0 , F(<l,n>) = T(<l,n>), and for all 
n ^ n 0 for which Prob{N = n}^0, V r (<l,n>) = i?(<l,n>). 

PROOF: We condition on N = m, for any 0 ^ M, Let K be the largest integer such that 

(e 0 — e r )*if ^ Dg + D r . Simple algebra (omitted here) establishes the inductive proof that for all 
n such that m — K < n ^ m, 
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F(<l,n>| N=m) - R(<l,n>\ N=m) = e 0 ‘(m — n + 1); 
and that for 0 ^ n ^ m — K, 

V r (<l,n>| N=m ) = T(<l,n>| N=m) = D d + D r + e r *(m — n + 1) 

and 

i2(<l,n>| N=m) = D d + D r + e 0 + e r -(m - n). 

Define d(n| i\T=m) to be the conditional difference T(<l,n>| N=m ) - i2(<l,n>| N=m ), and d(n) 
to be the unconditional difference T(<l,n>) - R(<l,n>). From the equations above, we see 
that as a function of m, 

( D d + D r — (e 0 — e r )*(m — n + 1) for m < n+K 

It follows from the definition of K that d(n| N=m) is a decreasing function of m. The uncondi- 
tional difference d(n) is obtained by taking the expectation of d(n| N=m) with respect to the 
residual distribution N n of N given N ^ n. Recalling our remarks in section 4, we have effectively 
assumed that N n N n+1 for all n ^0, implying that E[g(N n )} < £’[^(iV n+1 )] for all decreasing 
functions g. In particular, d(n) ^ d(n+l), showing that the difference T(<l,n>) — R(<l,n>) is 
an increasing function of n. Then case (i) occurs if d(n) is negative for all n, case (ii) occurs if 
d(n) is positive for all n, and case (iii) occurs if d(n) changes sign at n = n 0 . 

□ 


We summarize the known behavior of V(<p,n> ) as a function of p. R(<p,n> ) is a plcc 
function of p, and T(<p,n>) is linear in p. From this we infer that if T(<l,n>) < iZ(<l,n>), 
then the functional curves of T(<p,n> ) and R(<p,n>) can intersect at most once for p e [0,1] 
(otherwise the concavity of R(<p,n>) is violated). The last lemma shows that either 
T(<l,n>) ^ i?(<l,n>) for all n ^ 0, or that T(<l,n>) ^ i?(<l,n>) only if n is less than some 
threshold n 0 (potentially oo). Furthermore, it is easily seen that 

T(<0,n>) - JZ(<0,n>) = D d > 0 

for all n. These observations collectively establish the structure of the optimal stationary deci- 
sion policy for small enough n: if n ^ n 0 let 7r n be the unique solution to the equation 
T(<p,n>) = i2(<p,n>). The existence of this solution is ensured by continuity, and the fact that 
T exceeds R at p = 0 while R exceeds T at p = 1. The optimal policy is to retain in all states 
<p,n> such that p ^ ?r n , and to test in states <p,n> such that p > 7r n . Figure I illustrates this 
argument, showing plots of iZ(<p,n>) and T(<p,n>) as functions of p when n ^ n 0 . 

We complete the analysis of 7(<p,n>)’s behavior by supposing n > n 0 , so that 
i2(<l,n>) < T(<l,n>). The following lemma demonstrates that when n > n 0 , i2(<p,n>) is 
linear in p . Since T is linear and exceeds R at p = 0 and p = 1 for such n, the functional curves 
for T(<p,n>) and i2(<p,n>) cannot intersect. 

LEMMA 5 : If n > n 0 , then iZ(<p,n>) is linear in p, and V r (<p,n>) = iZ(<p,n>) for all 

p € [0,1]. 

PROOF: We proceed by induction. M is the largest integer such that Prob{N = M) ^ 0, so 
that V r (<p,Af+l>) = 0 for all p and 


0 


1 


p 

Figure I 


( D i + P’(«r + D r) 

V{<p,M>) = mini 

Presuming that n 0 <M, we have V(<p,M>) = R(<p,Af>) — p*e 0 , which is linear in p. 

For the induction hypothesis, we suppose there is an n > n 0 such that 
V r (<p,n+1>) = i?(<p,n+l>) for all p e [0,1], and that i2(<p,n+l>) is linear in p. Equation (7) 
implies that 

R(<p,n>) = p-e 0 + 9 e (p)-^(<p e (p),n+l>) + 9 e (p)‘ F(<p c (p),n+1>), 


and the induction hypothesis states that 


V(<p,n+1>) = A'p + B 

for some A and B. Equations (2), (3), and (5) show that that p e (p) = 
p*(p) = E {eUL - jt follows that 

?‘(p) 

R(<p,n>) = p-e 0 + A-p\p) + B 


?'(p) 


and that 


= p-[ e 0 + i4-(l - ^)] +^A-<f>+B^. 

Then i?(<p,n>) is linear in p; since T(<p,n>) exceeds i?(<p,n>) at both p = 0 and p = 1, it 
follows directly that T(<p,n>) exceeds i?(<p,n>) for all p e [0, 1], Thus 
V r (<p,n>) = J?(<p,n>), completing the induction. 

□ 


The discussions developed in this section prove our main analytic result. 

THEOREM 2 : For every n, there exists a x n e [0,1] such that the optimal stationary decision 
in state <p,n> is to retain if p ^ 7r n , and to test if p > r n . 

□ 
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VTL Minimization of Expected Execution Time 

We have constructed a decision process and exposed the structure of its optimal stationary 
policy. We next show that employment of the optimal decision policy minimizes the 
computation’s expected execution time. 

Our model formulation does not impose any execution costs until a change actually occurs. 
The optimal decision policy minimizes the expected sum of all overhead delays and all post- 
change execution delays. The overall expected finishing time is equal to the expected sum of all 
execution and overhead delays. However, the pre-change execution delays are independent of any 
decision policy. By minimizing the expected overhead and post-change execution delays, we 
minimize the expected finishing time. 

The optimality of our derived policy is conditioned on constant model parameters. 
Modification of these parameters may change the optimal decision policy without changing the 
computation in any functional sense. For example, we might decrease the batch means set size d 
or the AIC cluster size b to decrease the decision interval length. This modification would 
increase the responsiveness of the decision model to change, at the cost of increased error proba- 
bilities a and /?. This change in no way affects the functional behavior of the computation. Like- 
wise, our policy depends on both the quality of partitions created by the partitioning algorithm, 
and that algorithm’s running time. Our decision policy would be a valuable tool in exploring the 
tradeoffs between partition quality and the run-time required to achieve that quality. 

VIII. A Repartitioning Heuristic 

In this section we note that solving for the precise optimal thresholds 7T n is not computation- 
ally feasible, and examine a simple heuristic which nearly minimizes the expected computation 
execution time. We furthermore observe that the timely detection of change is the most impor- 
tant component of our repartitioning decision heuristic. 

In theory, the optimal cost equations (9) can be solved recursively. M was defined to be the 
largest integer such that Prob{N — 0. The recursive solution begins with 

V(<p,M>) = min{p-e 0 , D d + p-(c r + £>,)}, 

and then solves for decreasing n using equations (6)-(9). However, V(<p,n>) is piecewise linear 
with the number of pieces tending to double at each step of the recursion. An exact solution is 
not computationally feasible for any large M. Approximation techniques might be employed, but 
even then the solution method could require more computation than its results justify. Further- 
more, our decision model impractically presumes that the values of e 0 and e r are known a priori. 
We describe a heuristic which is based on the optimal decision policy structure. Empirical tests 
indicate that the heuristic yields total policy costs which are extremely close to the optimal 
policy’s cost. 

The optimal decision policy’s structure suggests that a heuristic be focused on the probabil- 
ity of change. We have shown already how this probability can be dynamically maintained. The 
first task for our heuristic is to determine when to estimate e 0 and e r . We want the heuristic to 
be responsive to a change, and yet we don’t want premature estimations of these execution time 
means. Before a change occurs, most change detection tests will report no change, and the poste- 
rior probability of change is calculated using the function p c (p). In [15] we show that p c (p) is a 
contraction mapping[l6], implying that so long as no change has occurred, the probability of 
change will be close to the fixed point solution q = p c (g). If p rt is significantly greater than this q 
we can be reasonably sure that a change did occur. Following this reasoning, we choose an initial 
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threshold p e so that e r and e 0 are estimated whenever p n > p e , p t is chosen so that starting with 
a prior probability equal to p c (p)’ s fixed point solution, three successive positive indications of 
change are needed to exceed p c . After estimating e 0 and e r , the value of n 0 is determined (in 
0(M) time). If the current time step n t exceeds n 0 , the original partition is retained for the rest 
of the computation. Otherwise, a high probability threshold p = .8 is chosen. We approximate 
for time steps n between n e and n 0 linearly by 


Pn = P + (! - py 


(» - n e ) 

K - n <) ’ 


The heuristic policy then mimics the optimal policy, treating each p n as though it were 7r n . A 
premature choice of the test decision causes the heuristic to abandon the {p n }, and wait again for 
the probability of change to exceed p e before recalculating the {p n }. 


EX. An Empirical Study 

We performed an empirical study comparing this heuristic’s performance with the optimal 
decision policy. The exact optimal decision policy cannot be calculated except for small Af. For 
large M we used a computationally expensive approximation to the optimal policy. At every step 
in the recursive solution, this approximation retained at most 1024 linear segments of an approxi- 
mation to V r (<p,n+1>). These segments were used to calculate the approximately 2048 linear 
segments of (the approximation to) V r (<p,n>). The 1023 segments closest to p = 1 were 
retained; the remainder of V r (<p,n>) was approximated by a single line segment extending from 
p = 0 to the leftmost of the retained segments. This approximation was designed to closely model 
the behavior of optimal cost function in the region of the optimal policy threshold. 

Our study varied two parameters: N and G = e 0 — e f . For simplicity, N was considered to 
be constant. Given values for N and G, the other model parameters had the following values: 


e o 

200 

4 

l/N 

e r 

200-G 

a 

0.2 

D d 

100 

p 

0.05 

D r 

100 




Table II 

The rational behind this quantification of e 0 , e r , D d , and D r was that we expect these values to 
have the same order of magnitude. <f> was chosen with the philosophy that if we know very little 
about a potential change, a reasonable guess is that it is (approximately) as likely as not that a 
change will occur, a and /3 model our experience with using the AIC statistic. Our selection of N 
and G as free parameters followed from our intuition (borne out by computational experience) 
that optimal behavior is influenced most by these two parameters. 

We allowed G to take on the values 5, 50, and 100, thereby spanning two orders of magni- 
tude. N was assigned the values 10, 50, 100, and 1000. We first measured the relative difference 
% n in finishing time between the computation under the approximated optimal policy and under 
the policy which always retains, i.e., does nothing. Relative to the total finishing time, % n is the 
maximal percentage gain we can hope to achieve. We then calculated % H , the percentage of this 
maximal gain achieved by our heuristic. Table III contains the result of these calculations; each 
measurement is plus or minus 0.5% with a confidence of at least 95%. 
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N 

G 

%» 

% H 

G 

%„ 

%b 

G 

%„ 


■El 

5 

WSM 


61 

4.7 

55.2 

100 

19.2 

75.3 

■s 

5 

Wm 

54.8 

E3 

11.3 

93.4 

100 

32.4 

95.5 

KE9 

5 

ESI 

82.9 

50 

12.2 

95.1 

100 

33.9 

97.1 

1000 

5 

ns 

98.3 

50 

12.5 

99.5 

100 

34.3 

99.5 


Table III 


We can draw a number of conclusions from Table III. Obviously, when the gain G achiev- 
able by repartitioning is very small, there is very little difference between using the optimal pol- 
icy, and the policy which always retains. If G is substantially larger, we can expect significant 
improvement in completion time by using a good decision policy. Furthermore, our proposed 
heuristic achieves most of the possible repartitioning gain. In fact, the relative difference in 
finishing time between our heuristic and the optimal decision policy is usually a fraction of one 
percent. The length of the computation also has a significant effect on our performance figures. As 
N grows, the performance of our heuristic tends to the optimal performance. It therefore appears 
that when G is precisely known, then our heuristic can be expected to yield nearly optimal perfor- 
mance. 

The experiments summarized by Table III assumed that the heuristic could accurately 
assess c 0 and e r . Since these quantities are difficult to predict, we tested the heuristic’s perfor- 
mance when it miss-estimated G. Miss-estimation of G affects our heuristic by altering the time 
step threshold n 0 , which in turn alters the {p n }. We tested the heuristic using the parameter 
values illustrated in Table III. For each pair of fixed G and AT, we caused the heuristic policy to 
miss-estimate G by factors of 10" 3 , 10“ 2 , HT 1 , 10, 10 2 , and 10 s . Table IV lists the resulting % H 


G 

N 

10" 3 

io - 2 

10" 1 

10 

~ m 2 ~ 

10 3 

50 

10 

- 7.1 

- 16.7 

- 19.4 

44.5 

48.3 

44.1 

100 

10 

5.2 

- 3.5 

0 

77.0 

77.1 

74.3 

5 

50 

35.2 

35.1 

34.6 

- 67.1 

- 97.7 

- 108.2 

50 

50 

10.0 

23.1 

33.8 

93.9 

94.2 

92.4 

100 

50 

14.1 

27.0 

82.0 

95.6 

96.4 

95.1 

5 

100 

11.4 

21.7 

22.3 

49.6 

42.9 

42.6 

50 

100 

53.1 

50.5 

86.9 

95.2 

96.4 

96.1 

100 

100 

55.7 

53.2 

95.7 

97.2 

97.5 

97.6 

5 

1000 

92.7 

92.6 

95.3 

98.4 

98.3 

98.4 

50 

1000 

94.8 

98.0 

99.6 

99.7 

99.7 

99.7 

100 

1000 

95.2 

99.2 

99.6 

99.7 

99.7 

99.7 


Table IV 

as a function of (?, AT, and the miss-estimation factor. 

Study of Table IV leads to several observations. Table IV clearly shows that the perfor- 
mance of the heuristic becomes insensitive to miss-estimation as N increases. This phenomenon 
parallels the observation from Table III that the performance of the heuristic increases with N . 
Both observations follow from the fact the the expected total gain from a new partition increases 
in AT, while the expected costs of not being optimally responsive after a change are constant. It is 
also clearly more harmful to underestimate G than it is to overestimate it. When N is small, 
underestimation leads to the conclusion that n 0 < n e , so that no new partition is adopted. The 
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heuristic’s costs are somewhat larger in this case than the true "always retain" policy, due to the 
costs of estimating G . As N grows, it becomes more likely that, even though G is underes- 
timated, the change occurs soon enough so that a new partition is adopted with a high enough 
probability of change. By overestimating G, it becomes possible to adopt a new partition when 
the benefits of doing so do not outweigh the costs D d and D r . This unhappy situation is realized 
only when both N and G are small. The most important conclusion we can draw from Table IV 
is that for most reasonable values of G and TV, gross miss-estimation of G does not seriously 
affect our heuristic’s performance. This implies that our heuristic’s most critical feature is its 
ability to detect change. 

X, Multiple Changes 

Our decision model presumes that at most one change will occur during the computation. 
We were dissuaded from incorporating multiple changes as we felt that the additional complexity 
might not lead to a better understanding of the problem. This opinion is supported by the 
intractability of finding the optimal policy assuming a single change, and the high performance of 
our simple heuristic. Nevertheless, we should consider how multiple changes might be handled. 

Only minor modifications are required to adapt our heuristic to the possibility of multiple 
changes. These modifications are invoked after a change occurs. Unlike the single change policy, 
the decision process does not stop after adopting a new partition. It remains active, looking for 
the next change. To do this, it must first collect a new base group B of observations which 
characterize the post-change behavior. The test cluster which triggered the new partition might 
be used as this B. The decision policy then continues just as before. The emperical results which 
imply that change detection is the most critical element of a repartitioning decision also suggest 
that this approach to multiple changes should work well. 

XI. Conclusions 

A good partitioning of a computation across multiple processors must make certain assump- 
tions about the computation’s running behavior. If that behavior were to radically change in the 
middle of the computation 1 , those assumptions could be invalidated, so that the partition is no 
longer effective in reducing the running time of the computation. We have considered the prob- 
lem of when to reject an old partition, and adopt a new one. We proposed the use of proven sta- 
tistical techniques to detect change in a computation’s stochastic behavior; we modeled the repar- 
titioning decision problem as a Markov decision process. This decision process takes into account 
the critical costs and benefits of repartitioning. We then characterized the decision policy which 
minimizes the expected computation finishing time. While this policy can be intuitively 
described, it is not easily quantified. We thus proposed and studied a heuristic decision policy 
which is modeled after the optimal policy. This heuristic performs remarkably well over a wide 
range of parameter values. Furthermore, it is quite insensitive to miss-estimation of the reparti- 
tioning gain. This observation leads us to conclude that the ability to detect change is the most 
important feature of a repartitioning policy. If we can detect change, and if we can expect to 
achieve more than marginal gain from repartitioning after a change, then we can expect our pro- 
posed policy to achieve most of the possible repartitioning gain. 


1. We are currently considering different models of stochastic behavior in "Load Balancing 
Computations with Non-St ationary Behavior", D.M. Nicol and J.H. Saltz, ICASE Report in preparation, 


1986 . 
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